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PREFACE 

This  thesis  is  the  presentation  of  the  results  of  an 
extensive  literature  search  in  the  area  of  robust  estimation 
techniques;.  Presently  there  is  no  formal  text  available 
on  the  market  that  gives  more  than  an  introductory  look 
at  robust  estimation  techniques.  Most  of  the  theory 
developed  over  this  area  has  been  presented  only  in 
statistical  journals  and  technical  reports.  It  was  the 
intent  of  this  thesis  to  present  in  a  concise  manner  a 
survey  of  several  oi  the  most  current  and  useful  techniques. 

Robust  estimators  were  chosen  which  were  theoretically 
and  computationally  tractable  so  that  they  could  be  easily 
understood  by  a  practicing  analyst  or  scientist.  Section  III 
contains  a  descriptive  analysis  of  the  chosen  estimators 
and  is  followed  by  an  extensive  analysis  of  their  performance 
using  Monte  Carlo  techniques  in  Section  IV. 

The  method  of  presentation  assumes  a  basic  understanding 
of  the  principles  of  probability  and  statistics.  All  material 
is  presented  as  simply  and  concisely  as  possible.  It  was 
intended  that  the  estimators  chosen  for  study  would  be 
ones  whose  overall  performance  was  good  and  which  lent 
themselves  toward  application  fairly  easily. 


ii 


GSA/MAVI/72-3' 


I  wish  to  thank  my  thesis  advisor  Professor  A.  H.  Moore 
whose  guidance  contributed  significantly  to  the  completion 
of  this  study  and  also  my  thesis  reader  Major  Ronald  J. 
Quayle  for  his  patience  and  direction. 

John  Caso 


111 


GSA/MA*l/72-3 


CONTENTS 

Page 


Preface .  ii 

Abstract .  vi 

I.  Background .  1 

II.  Introduction .  5 

III.  Types  of  Robust  Estimators .  10 

Estimators  Which  are  Special  Symmetrical 
Linear  Combinations  of  Order  Statistics. .  12 

Winsorized  Means .  13 

Trimmed  Means .  14 

Estimators  Which  are  not  Strictly  Funct¬ 
ions  of  Order  Statistics .  15 

Hodges -Lehmann  Estiinator .  15 

Huber's  Estimator .  17 

Quasilinear  Estimators .  20 

Switzer's  Estimator .  21 

Hogg's  Estimator .  25 

IV.  Monte  Carlo  Analysis .  29 

Probability  Distributions  Used .  30 

Rectangular .  31 

Triangular .  31 

Normal .  32 

Contaminated  Normal .  32 

Double  Exponential .  32 

Estimators  Considered .  33 

Computrtions .  33 

V.  Conclusions .  35 

Areas  for  Further  Investigation .  38 

Bibliography .  3? 


GSA/MA*/72-3 


CONTENTS 

Page 

Appendix  A:  Supplemental  Bibliography .  41 

Appendix  B:  Computer  Program  Listing .  46 

Appendix  C:  Graphs  of  Data... .  57 

Appendix  j.):  Tables  of  Relative  Efficiencies .  89 


V 


GSA./MAl«/72-3 


ABSTRACT 

Several  robust  estimators  were  considered  for  analysis 
and  explanation.  Monte  Carlo  techniques  were  used  to 
investigate  the  efficiency  of  these  robust  estimators  relative 
to  the  best  estimator  for  the  distribution  under  consider¬ 
ation.  Sample  sizes  of  12  and  24  were  drawn  4200  times 
from  five  symmetric  probability  distributions.  The  results 
showed  that  over  a  class  of  distributions  the  robust  est¬ 
imators  provided  a  higher  guaranteed  efficiency  than  the 
best  estimator  for  any  particular  distribution  in  the  family. 
Some  interesting  results  are  apparent  from  an  analysis  of 
the  graphs  in  Appendix  C  indicating  some  upper  bounds 
on  the  size  of  the  Monte  Carlo  sample  when  conducting 
this  type  of  a  study. 
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I.  BACKGROUND 


Throughout  many  areas  of  scientific  investigation  there 
has  been  established  a  storehouse  of  information  and 
techniques  as  a  result  of  previous  research  and  experiment¬ 
ation.  This  previous  research  and  experimentation  coupled 
with  an  ability  which  exists  in  many  disciplines  to  isolate 
a  sitxiation  for  the  purpose  of  observation  has  been  an 
invaluable  aid  to  the  experimenter  when  testing  a  hypothesis. 
During  this  century  all  areas  of  science  have  turned  at  one 
time  or  another  to  mathematical  statistics  as  an  aid  to 
scientific  investigation.  In  the  last  quarter  century  extensive 
empirical  investigation  has  given  way  almost  completely 
to  statistical  inference  and  statistical  testing  of  hypothesis. 
Some  disciplines  benefit  more  than  others  by  this  technique. 
Consider  a  continuum  with  the  exact  physical  sciences 
positioned  at  the  far  left  extreme  and  the  inexact  social 
sciences  positioned  at  the  right  extreme.  As  you  progress 
from  left  to  right  one  notices  a  marked  decrease  in  the 
degree  of  confidence  that  can  be  placed  upon  statistical 
estimates.  The  physicist  and  chemist  at  the  far  left  side 
of  the  continuum  have  an  abundance  of  empirically  supportable 
evidence  with  which  to  base  assumptions  concerning  the 
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distribution  of  the  population  from  which  they  are  sampling. 

This  empirical  support  decreases  rapidly  as  you  move  from 
left  to  right  along  this  continuum. 

Located  at  a  point  somewhere  right  of  center  on  the 
continuum  is  a  relatively  new  discipline.  Systems  Analysis, 
of  which  the  author  of  this  paper  is  a  student.  The  basic 
tools  of  Systems  Analysis  are  mathematics  and  mathematical 
statistics  along  with  many  of  the  techniques  of  Operations 
Research.  A  concise  definitive  explanation  of  Systems  Analysis 
does  not  appear  to  be  available  and  maybe  not  even  possible. 

In  a  vague  sense  Systems  Analysis  attempts  to  combine 
pieces  of  information,  which  can  be  disjoint  and  totally 
unrelated,  about  a  large  or  small  system,  and  to  draw 
inferences  for  basing  conclusions  so  that  a  decision  may  be 
made  or  a  course  of  action  plotted. 

Measures  of  central  tendency  e.g. ,  mean,  mode,  median, 
are  usually  important  statistics  in  all  areas  of  investigation. 
Estimates  of  these  measures  are  usually  made  based  on  the 
assumptions  concerning  the  distribution  of  the  sampled 
population.  For  the  reasons  stated  earlier  the  sciences 
close  to  the  left  of  the  continuum  have  relatively  little 
trouble  in  determining  the  form  of  the  underlying  distribution 
of  a  sample.  Now  consider  the  plight  of  the  .Systems 
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Analyst  outlined  in  the  following  hypothetical  situation. 

A  Departnr.ent  of  Defense  analyst  is  asked  to  estimate 
the  total  number  of  nuclear  submarines  and/or  conventional 
submarines  required  to  effectively  defend  the  coastal 
United  States  from  attack.  Whether  a  point  estimate  or  an 
interval  estimate  is  required  is  immaterial  since  the  same 
difficulties  will  exist  in  either  case.  There  might  be  a 
large  number  of  individual  estimates  which  could  be  comb¬ 
ined  to  determine  the  overall  estimate.  For  example  an 
estimate  of  the  average  speed  of  conventional  and  nuclear 
submarines  would  probably  be  required.  It  would  be 
necessary  to  determine  the  amount  of  ocean  area  that  these 
submarines  could  cover  per  unit  of  time.  The  natural 
tendency  would  be  to  take  some  random  observations  of  the 
cruising  speed  of  both  types  and  then  to  compute  the 
arithmetic  mean.  This  statistic  is  known  to  be  reliable 
when  the  sample  is  drawn  from  a  normally  distributed 
poptilation.  But  what  if  this  assumption  was  not  justified. 
The  resultant  error  in  most  cases  would  be  small  but  could 
be  catastrophically  large  in  certain  cases.  Let  as  suppose 
the  error  was  small.  Consider  now,  however,  a  possible 
one  thousand  plus  individual  estimates  that  might  be  used 
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in  determination  of  the  optimal  force  size.  How  much 
confidence  could  you  have  in  this  estimate  with  the  possibility 
of  a  small  error  compounded  one  thousand  times  present? 

In  problems  of  this  type  assumptions  about  population 
distributions  are  c'iffirult  to  make  because  of  a  usually 
small  number  of  observations  but  even  more  so  as  a 
result  of  the  uniqueness  of  each  problem. 

With  this  background  in  mind  this  thesis  will  examine 
some  of  the  recent  innovations  in  statistical  estimation 
theory.  An  attempt  will  be  made  wherever  possible  to 
present  the  statistics  considered  in  a  manner  which  lends 
itself  toward  application  of  these  statistics  as  opposed  to 
a  purely  theoretical  approach  that  may  be  of  interest  only 
to  the  theoretical  statistician. 
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II.  INTRODUCTION 


Possibly  the  most  important  problem  of  statistical  inference 
is  the  estimation  of  parameters  (such  as  population  mean, 
variance,  etc.)  from  the  corresponding  statistics  (i.e.  sample 
mean,  variance,  etc.).  The  theory  of  estimation  originated 
with  problems  where  almost  all  of  the  statistical  variability  is 
due  to  measurement  errors.  This  situation  should  be  clearly 
distinguished  from  the  opposite  case  where  the  data  shows  a 
large  internal  variability.  It  is  interesting  to  note  that  Gauss 
introduced  the  normal  distribution  to  provide  an  assymptotic 
distribution  for  the  sample  mean.  That  is  the  statistic 
existed  before  the  theory  for  the  normal  distribution  v^s 
developed.  Throughout  the  years  the  use  of  the  arithmetic 
mean  has  become  almost  sacred  even  though  it  could  have 
easily  been  designed  in  some  other  form,  for  example  omitt¬ 
ing  the  three  largest  observations.  This  dogmatic  use  of  the 
sample  mean  caused  experimenters  to  be  igiiorant  of  the 
high  sensitivity  to  deviations  from  normality  of  some  of 
these  standard  procedures.  In  the  late  1940’s  distribution 
free  procedures  brought  relief  to  some  of  these  c.slimalion 
difficulties.  More  significant  advanccL  in  ihi.s  area  were 
made  throughout  the  19o0*s.  It  was  recognir.cd  that  one 
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never  really  has  a  very  accurate  knowledge  of  the  true 
underlying  distribution  and  that  the  performance  of  some  of 
the  classical  estimates  is  very  unstable  under  small  changes 
in  the  underlying  distribution. 

This  paper  will  be  confined  to  a  study  of  estimation  of 
location  parameters.  Throughout  this  paper  the  location 
parameter  will  denote  the  center  of  symmetry  of  a  symmetric 
distribution  on  the  real  line.  When  the  density  function  of 
a  distribution  is  well  specified  there  are  usually  several 
methods  available  to  obtain  large  sample  estimators  of  the 
location  parameter.  For  example  if  f  (the  density  function) 
is  Uniform  the  the  midrange  is  an  efficient  estimator  of 
A.  (the  location  parameter)  or  if  f  is  Double  Exponential 
(Laplace)  then  the  median  is  an  asymptotically  efficient 
estimator  of  A  .  This  study  will  primarily  be  concerned 
with  estimates  of  location  parameters  when  the  exact  form 
of  the  underlying  distribution  is  not  known. 

Statistical  methods  which  are  relatively  insensitive  to 
assumptions  about  their  underlying  distributions  have  been 
termed  robust  methods  (Ref  3:169).  This  term  has  been 
extended  to  include  estimators  which  have  been  specifically 
designed  for  estimation  when  the  form  of  the  underlying 
distribution  is  not  known  but  some  character  of  the  family 
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of  which  the  underlying  distribution  is  a  member  is  known. 
For  example  we  may  know  that  the  density  function  is  sym¬ 
metric.  These  are  called  robust  estimators.  The  main 
idea  is  that  these  estimators  have  been  specifically  designed 
for  this  purpose  and  we  are  not  merely  investigating  the 
statistical  robustness  of  an  existing  estimator  for  a  known 
distribution.  This  study  has  been  limited  to  estimators  of 
location  parameters  and  does  not  consider  investigation  of 
the  estimation  of  scale  parameters.  Huber  (Ref  8:93)  dis¬ 
cusses  at  some  length  the  unsatisfactory  aspects  of  attempt¬ 
ing  to  estimate  a  scale  parameter.  He  summarizes  the  rea¬ 
sons  why  this  author  and  most  statisticians  have  avoided  this 
area. 


"The  theory  of  estimating  a  scale  parameter  is  less 
satisfactory  than  that  of  estimating  a  location  param¬ 
eter.  Perhaps  the  main  source  of  trouble  is  that 
there  is  no  natural  "canonical"  parameter  to  be  est¬ 
imated.  In  the  case  of  the  location  parameter,  it  was 
convienent  to  restrict  attention  to  symmetric  distribu¬ 
tions;  then  there  is  a  natural  location  parameter, 
namely  the  location  of  the  center  of  symmetry,  and 
we  could  separate  difficulties  by  optimizing  the  est¬ 
imator  for  symmetric  distributions  (where  we  know 
what  we  are  estimating)  and  then  investigate  the 
properties  of  this  optimal  estimator  for  non  standard 
conditions,  c.g. ,  for  nonsymmetric  distributions. 

In  the  case  of  the  scale  parameter,  we  meet,  typically, 
highly  symmetrical  distributions,  and  the  above  device 
to  ensure  unicity  of  the  parameter  to  be  estimated  fails. 
Moreover  it  becomes  questionable,  whether  one  should 
minimize  bias  or  variance  of  the  estimator. 

So  we  shall  just  go  ahead  and  shall  construct  estima- 
tora  that  arc  invariant  under  s'  ’j  transformations  and 
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that  estimate  their  own  asymptotic  values  as 
accurately  as  possible.  Of  course  one  has  to 
check  afterward  in  a  few  typical  cases  what  these 
estimators  really  do  estimate". 


Specifically  this  thesis  has  two  purposes. 

1.  To  provide  a  survey  of  the  current  techniques  involved 
in  robust  estimation  of  a  location  parameter  of  a 
symmetric  probability  distribution. 

2.  To  explore,  using  Monte  Carlo  techniques,  the  performance 
of  some  selected  robust  estimators  of  a  location  parameter. 
The  purpose  of  this  investigation  will  be  to  ascertain 

what  benefit,  if  any,  can  be  achieved  through  the  use 
of  robust  estimators  making  no  assumptions  about  the 
specific  form  of  the  underlying  probability  distribution, 
as  opposed  to  employing  known  estimators  for  a 
predetermined  probability  distribution. 

To  achieve  these  objectives  an  extensive  literature  search 
and  study  was  made  of  the  available  literature.  The  results 
are  presented  in  this  thesis.  The  bibliography  contains 
a  listing  of  those  sources  found  to  contain  much  of  the 
applicable  information  on  robust  estimation  which  were  used 
directly  in  the  formulation  of  this  paper.  Appendix  A 
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is  a  supplemental  bibliography  which  contains  a  listing 
of  those  sources  which  were  either  applicable  to  robust 
estimation  techniques  and  were  not  available,  which  apply 
only  to  statistical  robustness  in  general,  or  were  sources 
of  a  general  nature  which  were  useful  in  the  formulation 
of  this  paper. 
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III.  TYPES  OF  ROBUST  ESTIMATORS 

The  purpose  of  this  section  is  to  present  in  a  summarized 
form  several  robust  estimators  which  have  been  developed 
for  the  purpose  of  estimating  the  center  of  symmetry  of 
an  unspecified  distribution.  The  estimators  considered 
will  be  of  two  basic  types.  One  type  has  the  characteristic 
that  the  functional  form  of  the  estimator  does  not  depend  on 
the  sample  while  the  other  type  has  the  actual  functional  form 
of  the  estimator  determined  by  the  information  contained 
in  the  sample. 

Pioneer  efforts  in  robust  estimation  were  mainly 
concerned  with  departures  from  the  assumptions  of  normality. 

By  appealing  to  the  central  limit  theorem  many  distributions 
can  be  considered  to  be  approximately  normal.  However  it 
is  easily  demonstrated  that  even  a  slight  departure  from 
the  assumption  of  normality  can  often  cause  the  sample 
mean  to  behave  badly  as  an  estimator  of  the  location 
parameter.  Early  studies  were  devoted  primarily  to 
estimation  methods  where  the  underlying  distribution  was  the 
standard  normal  but  was  contaminated  in  some  manner  by  a 
distribution,  usually  normal,  with  a  larger  amount  of  dispersion. 
More  recent  inquiries  consider  situations  involving  more 
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varied  sets  of  distributions.  In  most  cases  only  symmetric 
unimodal  distributions  are  considered.  Tukey  (Ref  13:30) 
emphasizes  this  restriction  when  he  considers  the  treatment 
of  "spotty  data". 

"Accordingly  it  will  be  for  us  to  begin  with  long 
tailed  distributions  which  offer  the  minimum  of 
doubt  as  to  what  should  be  taken  as  the  true 
value.  If  we  stick  to  symmetric  distributions 

we  can  avoid  all  difficulties  of  this  sort . 

No  other  point  on  a  symmetrical  distribution 
has  a  particular  claim  to  be  considered  the 
true  value.  Thus  we  will  do  well  by  restrict¬ 
ing  ourselves  to  symmetric  distributions". 

\ 

This  quote  is  presented  here  because  throughout  all  the 
more  recent  papers  dealing  with  robust  estimation  the 
restriction  of  a  symmetrical  distribution  appears  to  have 
been  strictly  adhered  to  and  the  Tukey  (Ref  13:1)  paper 
given  as  the  reference  source.  This  quote  will  also 
provide  some  justification  for  the  structure  of  the  Monte 
Carlo  analysis  presented  in  the  next  section  of  this  thesis. 

The  first  robust  estimators  considered  here  will  be 
of  the  type  where  the  specific  form  of  the  estimator  does 
not  depend  on  the  information  contained  in  the  sample. 

The  functional  form  of  these  estimators  will  be  struct¬ 
ured  as  order  statistics. 
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This  type  of  estimator  was  analyzed  rather  extensively  by 
Croe  and  Siddiqui.  A  class,  F,  of  distributions  are 
considered  which  are  normal,  cauchy,  parabolic,  triangular, 
and  rectangular.  The  results  presented  claim  that 
asymptotic  efficiencies  of  at  least  .  82  relative  to  the  best 
estimator  for  a  single  distribution  are  achieved  by  the  best 
trimmed  mean  or  linearly  weighted  mean  (Ref  3:353). 

Estimators  Which  are  Special  Symmetrical  Linear  Combinations 
of  Order  Statistics 

Two  estimators  of  this  type  wiU  be  considered  here. 

The  winsorized  mean  and  the  trimmed  mean.  These 
estimators  have  been  present  for  many  years  but  were  used 
very  sparingly.  The  theoretical  basis  for  winsorized  and 
trimmed  means  is  the  technique  called  rejection  of  outliers. 

The  trimming  removes  equal  numbers  of  the  highest  and 
lowest  observations  and  then  proceeds  with  the  remainder 
as  if  it  were  a  complete  sample.  If  the  samples  do  come 
from  a  normal  distribution  there  will  be  some  loss  in 
^efficiency  and  there  will  be  an  increase  in  efficiency  when 
the  samples  arc  from  a  distribution  with  long  tails. 

Let  ^2"“  *  **  order  statistics 

resulting  from  random  sampling  of  F(X-A.)€  g 
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and  subsequent  ordering.  Q  consists  of  distributions  which 
are  symmetric  about  the  median  A  and  such  that  A  is 
the  unique  mode.  The  amount  of  trimming  or  winsorizing 
P  is  determined  here  such  that 

P  =1/2  -  % 

where  P  is  a  non -negative  integer  less  than  ^  /  2 

Winsorized  Meanr  (Ref  13:1). 


Wn(p) 

(3-2) 

n- 

t  . 

« 

1  = 

r-*2 

and  if 

n  =2v-* 

1 

Wn(1/(2n)) 

(3-3) 

where 

r-(n-1 ) 

12 
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Trimmed  Means  (Ref  13:1). 


Tn(p)=(n-2r) 


n-r 

-T 


i^r+1 


X: 


(3-4) 


In/a  normal  sample  winsorized  means  are  more  stable 
\ 

than  trimmed  means.  The  possible  loss  in  efficiency 
through  the  use  of  these  estimators  is  far  overshadowed  by 
the  large  possible  gain  when  the  assumption  of  normality 
is  violated.  Many  papers  published  in  the  area  of  robust 
estimation  have  dealt  with  linear  combinations  of  order 
statistics  (Ref  1,3,4, 5)  and  in  many  cases  much  of  the 
analysis  centered  around  winsorized  and  trimmed  means. 

For  an  in  depth  discussion  of  this  area  see  the  paper  by 
Gastwirth  and  Rubin  (Ref  5).  Gastwirth  and  Rubin  demon¬ 
strate  that  within  a  large  class  of  estimators  there  is  a 
unique  maximum  efficient  linear  estimator.  The  difficulty 
of  determining  a  maximum  efficiency  linear  estimator  for 
specific  families  of  densities  is  emphasized  and  the  paper 
is  restricted  to  searching  for  maximum  efficient  estimators 
in  smaller  classes  of  linear  estimators  such  as  the  trimmed 
means  and  linear  combinations  of  a  few  sample  percentiles. 
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Estimalors  Which  are  not  Strictly  Functions  of  Order 
Statistics 

This  section  will  develop  the  estimators  in  the  same 
manner  as  the  previous  section.  While  many  of  these 
estimators,  which  make  up  the  greater  part  of  this  study, 
do  utilize  order  statistics  they  are  not  considered  strictly 
functions  of  order  statistics  as  were  the  estimators  in 
the  previous  section.  These  estimators  were  chosen 
from  the  many  which  exist  today  for  several  reasons. 

First  of  all  they  have  been  shown  in  some  previous  studies 
to  possess  a  high  relative  efficiency  over  fairl/  broad 
classes  of  distributions.  Secondly  they  are,  in  most  cases, 
theoretically  simple  to  comprehend  and  computationally 
tractable  to  apply. 

Hodges -Lehmann  Estimator  (Ref  6). 

This  estimator  was  one  of  the  earlier  attempts  at  f.ie 
development  of  a  robust  estimator  and  from  most  results 
in  the  literature  appears  to  be  c  -.e  of  the  best  estimators. 
Results  obtained  by  Bickel  (Ref  1)  indicate  that,  in  terms 
of  robustness,  the  Hodges -Lehmann  estimate  is  superior 
to  the  trimmed  and  winsorized  means.  It  is  simple  in 
form  and  computationally  easy  to  handle.  Hodges  (Ref  6) 
defined  this  estimator  of  the  location  parameter  in  terms 
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of  rank  test  statistics  such  as  the  Wilcoxan  or  Normal 
scores  statistic. 

Let  X-^t  X^' . '^n  ^  random  sample  from 

an  unknown  symmetric  probability  distribution.  Then 


HL 


X.,X2v.Xn’’=  MEdF  ' 

J  i=j  L  2  . 


(3-5) 


ij  =1.2 . n 


This  estimator  is  formed  by  taking  the  median  of  the 
mean  of  all  of  the  sample.  In  the 

Hodges -Lehmann  paper  (Ref  6)  it  is  shown  that  the  estimates 
are  symmetric  with  respect  to  the  parameter  being  estim¬ 
ated  and  thus  to  be  unbiased  if  the  underlying  distribution 
of  the  observations  on  which  the  estimate  is  based  is 
symmetric.  The  form  of  the  estimator  makes  it  the  only 
practically  tractable  estimator  derived  from  the  ranks 
test.  When  the  sample  gets  very  large,  however,  the  number 
of  steps  involved  becomes  prohibitive.  An  alternate  form 
of  this  estimator  which  uses  ordered  samples  and  is  much 
quicker  to  compute  for  large  samples  has  shown  to  be 
good  in  certain  situations 
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Let  ^  I  > 
sample. 


•» 


be  an  ordered  random 


HL2[x^,  X2...Xn]  =med[ 


^n+1-j ' 

2  J 


(3-6) 


Huberts  Estimator  (Ref  8). 


Huber  deals  with  the  asymptotic  theory,  of  estimating 
a  location  parameter.  The  emphasis  in  this  paper  wa.s 
placed  on  treating  contaminated  normal  distributions. 
There  seems  to  be  some  discussion  over  just  how  well 
this  estimator  actually  performs.  It  is  presented  here 
in  summary  form  mainly  because  Huber  does  attempt  to 
design  a  robust  estimator  of  the  scale  parameter. 
Basically  Huber  considers  the  method  of  least  squares 
where  the  idea  is  to  minimize  the  expression 
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Solutions  to  these  equations  for  X  and  3  along  with 
an  iterative  computational  procedure  can  be  found  in 
I^one  (Ref  10).  Leone  performed  a  Monte  Carlo  study 
using  contaminated  normal  distributions  drawing  a  sample 
of  size  20.  The  sample  was  drawn  500  times.  The 
results  indicate  that  the  variance  of  the  estimators 
increased  with  an  increase  in  the  scale  parameter  of  the 
contaminating  distribution  but  are  less  sensitive  to  a  change 
in  the  location  parameter. 

The  remaining  estimators  to  be  considered  in  this  section 
are  the  type  whose  functional  form  is  determined  by  the 
information  contained  in  the  sample.  Takeuchi  (Ref  12:292) 
called  this  type  quasilinear  estimators. 

Quasilincar  Estimators 

The  estimators  of  this  type  considered  here  will  normally 
use  some  known  statistic  for  information  with  which  to 
base  a  choice  between  several  competing  functional  forms 
available.  The  various  functional  forms  used  in  the  following 
estimators  were  chosen  on  the  basis  of  some  very  weak 
assumptions  about  the  general  form  that  the  underlying 
distribution  might  possess.  Thus  it  should  be  clear  that 
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these  choices  are  inerely  examples  and  the  intent  of  the 
analysis  here  is  to  emphasize  the  great  flexability  available 
in  choosing  these  competing  functional  forms,  making  these 
very  powerful  estimators. 

Keeping  the  notation  consistent  with  that  used  previously 
^  will  denote  a  family  of  distributions  which  are 
symmetric  aoout  the  median  A  and  such  that  A 
is  the  unique  mode. 

Switzer*s  Estimator  (Ref  11). 

The  method  employed  here  is  to  choose  from  a  set  of 
competing  estimators  that  estimator  which  has  the  minimum 
standard  error  for  the  sample  being  considered.  The  forms 
of  the  competing  estimators  are  predetermined  but  which  one 
is  chosen  is  determined  by  the  information  contained  in  the 
sample.  In  formulating  this  estimator  Switzer  outlines  two 
fairly  loose  restrictions  on  the  set  of  competing  estimators 
from  which  to  choose: 


1.  that  the  competing  estimators  be  such  that  their 
standard  errors  can  also  be  estimated  without  making 
use  of  the  unkno^vn  shape  of  the  underlying  distribution 
and, 

2.  that  the  collection  should  contain  only  estimators 
whose  efficiency  relative  to  one  another  ranges  from 
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very  small  to  very  large  numbers  as  the  distributions 
range  over  a  set  of  reasonable  possibilities. 
Estimators  are  chosen  in  the  manner  described  for  each 


available  sample  of  equal  size  and  a  sequence  of  estimates 
of  the  location  parameter  is  obtained.  Switzer  chose  to 
limit  the  number  of  competing  estimators  to  three.  This 
paper's  author  will  continue  this  convention  throughout  this 
thesis.  However  it  is  apparent  that  this  principle  could  be 
extended  to  include  a  larger  number  of  competing  estimators 


and  is  so  suggested  by  Switzer  at  the  close  of  his  paper. 

Let  , 2 , 3  three  sequences  of  competing 

estimates  obtained  from  three  selected  estimators  which 
are  defined  for  every  sample  size  N  S  j  I  “I'* 

2  •»  3  be  non-p>arametric  estimates  of  the  standard  errors. 


Then  the  recommended  estimator  is: 
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It  is  assumed  here  that  Vn(£,.  -  A.]  has  a 

limiting  normal  distribution  with  Q  mean  for  j  =1  2  3 
and  that  the  standard  error  estimates  were  chosen  so  that 
N-Sf  consistently  estimates  for  each  j  , 

and  f  belonging  to  a  large  class  Q  .  If^jfJ  is 
the  most  efficient  of  the  three  competitors  for  a  given  f 
thetVN  SW  “  X.  I  has  the  same  limiting  distribution 

as  VfJ  ^jf)  ”  A.  I  for  all  f  €  G 

Switzer  outlines  two  general  procedures  for  obtaining 
non  parametric  estimM.tcs  of  the  standard  errors  of  the 
competing  estimators.  Only  one  procedure  will  be  presented 
here.  It  is  a  two  step  procedure. 

Step  1.  Assume  the  sample  can  be  divided  into  K 
blocks  of  equal  size  H  ~ 

Step  2,  Compute  based  on  samples  of  size  PI  , 

k=1.2...K  i  =  l2.3 


k 

f;  ■  Z  Vk 
1 


(3-17) 
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and 


(3-18) 


Specifically,  Switzer  chose  for  study  an  IN  (sample  size) 
which  was  divisible  by  six  and  computed  the  three  mid 
ranges. 


Xj3)+X[^,| 


(3-19) 


(3-20) 


k=1,2,.  .  K 


(3-21) 
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The  results  of  the  Monte  Carlo  analysis  performed  by 
Switzer  using  his  estimator  as  previously  outlined  showed 
that  the  SW  estimator  performed  very  well  when  the  sample 
was  drawn  from  short,  long,  and  normal  tailed  distriliutions 
with  samples  of  size  30,  60,  and  120.  In  each  case  the 
SW  estimator  was  not  quite  as  good  as  the  best  estimator 
(sample  mean,  median,  mid  range,  etc.)  for  that  particular 
shape.  It  was  shown  however,  to  always  have  less  variance 
than  the  other  estimators  considered  for  that  shape. 


This  estimator  is  very  interesting  because  of  the  large 
amount  of  possibilities  it  presents  and  very  appealing  because 
of  Its  extreme  simplicity.  Hogg  uses  the  kurtosis  of  the 
sample  to  determine  which  form  the  estimator  shou’d  take. 
Kurtosis  here  being  defined  as  the  fourth  central  moment 
divided  by  the  square  of  the  variance.  The  sample  kurtosis 

(3^22) 

where  D  is  the  sample  size  and  X  is  the  sample 
mean^  converges  in  probability  to  the  kurtosis  of  the 
underlying  distribution  of  the  sample.  Hogg  subsequently 
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structured  his  estimator  in  the  following  manner. 


Aa  k.2. 

X  2.-k-A.  (3-23) 

4-k=5.5 

M  5.5*=  k 


where 


is  the  mean  of  the  rV4  smallest  and  n/4 
largest  items  of  the  sample. 


Xh  is  the  mean  of  the  remaining  interior  sample  items. 

74 


is  the  sample  mean. 


M  is  the  sample  median. 


The  many  possibilities  of  this  estimator  should  now  be 
apparent  to  the  reader  for  there  is  really  no  restriction 
on  the  possible  ranges  of  K  or  the  choice  of  forms  for 
the  estimator.  Based  solely  on  the  kurtosis  of  the  sample 
this  estimator  might  prove  useful  indeed  if  its  corresponding 
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results  were  fruitful.  Hogg  performed  a  Monte  Carlo 
analysis  in  which  the  performance  of  his  estimator  was 
compared  mainly  with  the  performance  of  the  Hodges - 
Lehmann  estimator.  The  analysis  was  performed  over  a 
class  of  distributions  ranging  from  Rectangular  to  Cauchy. 
Hogg's  estimator  performed  better  overall  than  the  Hodges - 
Lehmann  estimator  which  also  performed  very  well. 

It  is  possible  to  generalize  Hogg's  estimator  in  such  a 
manner  that  the  estimator  is  a  linear  combination  of  the 
sample  items  with  weights  which  are  continuous  functions 
of  the  sample  items.  The  procedure  is  summarized  below. 
See  Hogg  (Ref  7:1184)  for  a  more  complete  discussion. 

If  Xl  ^  ^2*  '  '  '  *  sample  values  then 


n 

HG2  = 

i=1 

where 


(3-241 


(3-25) 
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IV.  MONTE  CARLO  ANALYSIS 


This  investigation  was  conducted  to  explore  the  performance 
of  three  of  the  estimators  discussed  in  the  previous  section. 
The  three  chosen  were  the  Hodges -Lehmann  estimator,  the 
Switzer  estimator,  and  Hogg's  estimator.  There  were  two 
reasons  for  selecting  these  three  estimators  from  the  many 
which  can  be  found  in  the  available  literature.  First  of 
all  the  Hodges -Lehmann  estimator  and  Hogg's  estimator 
had  demonstrated  a  high  degree  of  efficiency  in  estimating 
location  parameters  in  much  of  the  analysis  found  in  the 
literature.  The  Switzer  estimator  is  very  new  and  could 
be  found  in  only  one  article  (Ref  11).  The  Switzer  estimator 
does  however  demonstrate  a  new  and  interesting  technique 
for  exploration.  Thus  in  an  attempt  to  test  the  performance 
of  the  Switzer  estimator  it  was  necessary  to  select  what  are 
generally  considered  the  best  available  robust  estimators 
as  competitors.  The  second  reason  was  that  these 
estimators  had  not  previously  been  compared  against  one 
another  for  these  sample  sizes  and  probability  distributions 
and  also  that  these,  as  with  all  the  estimators  considered 
in  the  previous  section,  were  computationally  and  theoretically 
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The  analysis  was  basically  a  computer  exercise  and  all 
computations  were  performed  on  the  Control  Data  Corpora.t.ion 
6600  Computer  System.  Five  basic  probability  distributions 
were  selected  which  were  symmetric  and  unimodal. 

Utilizing  Monte  Carlo  techniques,  random  samples  of 
S’ze  12  and  24  were  drawn  from  these  five  distributions. 

At  the  outset  of  the  analysis  several  larger  sample  sizes 
were  drawn  but  the  additional  gain  in  information  did  not 
prove  to  be  worth  the  extra  cost  in  computer  time  so  these 
larger  sample  sizes  were  eliminated.  Using  the  random 
samples  drawn,  estimates  of  the  location  parameter  were 
computed  using  the  robust  estimators  and  also  using  the 
known  statistics  which  are  the  "best"  estimators  for  each 
of  the  distributions  considered.  The  final  step  was  to 
compute  the  variance  from  the  true  value  of  the  location 
parameter  for  each  of  the  estimates.  The  computer 
program  listing  of  the  program  designed  to  accomplish  this 
procedure  can  be  found  in  Appendix  B. 

Probability  Distributions  Used 

Samples  of  size  12  and  24  were  drawn  from  each  of  five 
probability  distributions.  As  slated  earlier  each  was  a 
symmetric  distribution.  The  specific  distributions  were 


30 


m 


GSA/M4*^/72-3 


selected  becau5ie  of  their  similarity  in  the  sense  that  they 
could  easily  be  mistaken  for  one  another  when  a  decision 
maker  had  to  base  a  decision  on  a  small  sample.  It  is 
also  possible  for  these  distributions  to  occur  in  combination 
with  one  another  thus  causing  further  confusion. 

Rectangular. 

F(x]=1  0^x■-^ 


Triangular. 


F1(x)  =  [2/[atb)a}' 


X 


-a-x2  0 


(4-2) 


F2[x)=[2/(a.b]bl[b-x],  =  b 

Drawings  were  made  from  this  distribution  with  three 
different  parameters. 


-1  2  X  -  1 
-5SX -  5 
-10=x  =10 
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Normal. 

*1 


Contaminated  Normal. 

107o  Contamination 

.90^|27f"exp  2  * 

„r_J! _ ^1  .x^ 

20%  Contamination 

Double  Exponential. 

F[x]  =  J^  exp  |x| 


(4-5) 


(4-6) 


(4-7) 
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Estimators  Considered 

The  three  robust  estimators  analyzed  were  used  in  the 
forms  stated  earlier  in  this  text,  i.  e.,  equations  3-5,  3-16, 
and  3-23.  Three  popular  statistics  were  also  computed, 
the  sample  mean,  the  sample  median,  and  the  mid  range. 
The  form  of  these  statistics  should  be  familiar  to  anyone 
with  an  interest  in  statistics. 


Computations 

The  variance  of  each  estimator  with  respect  to  the  true 
value  of  the  location  parameter  v/as  computed  using  the 
mean  square  error.  Each  sample  size  was  drawn  4200 


The  computer  program  was  designed  to  compute  the  mean 
square  erre  .  every  350  repetitions  and  provide  these  values 
as  outputs.  Appendix  C  contains  graphs  of  some  selected 
resuhr  obtained  for  some  of  the  estimators  from  each 
distribution  considered.  The  majority  of  the  graphs  were 
omitted  from  this  thesis  to  keep  the  size  manageable 
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and  it  was  the  opinion  of  this  author  that  they  would  not 
provide  any  meaningful  information. 

Relative  efficiencies  were  also  computed  and  the  results 
are  expressed  in  per-centaee  form  and  presented  in 
Appendix  D.  Relative  efficiency  is  defined  here  to  be 
the  ratio  of  the  variance  of  the  best  estimator  for  the 
distribution  considered  to  the  estimator  whose  efficiency 
is  under  consideration  . 
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V.  CONC  LUSIONS 

Since  this  thesis  was  intended  mainly  to  provide  a 
survey  of  existing  robust  estimation  techniques  the  content 
is  not  easily  extrapolated  to  many  significant  conclusions. 
Several  interesting  observations  however,  were  made  in 
the  course  of  this  study  which  are  worthy  of  note. 

First  of  all  the  technique  of  robust  estimation  as  it 
has  been  presented  here  is  not  over  10  years  old.  The 
scarcity  of  practical  estimation  techniques  and  an  absence 
of  a  theoretical  foundation  for  this  discipline  emphasizes 
its  newness.  The  fact  that  investigation  in  this  broad 
and  interesting  area  of  statistics  has  h  rdly  scratched 
the  surface  is  a  conclusion  worthy  of  mention.  There 
appears  to  be  approximately  ten  theoretical  statisticians 
who  are  doing  the  majority  of  the  research  in  this  area. 
Their  names  can  be  found  in  the  bibliographies  at  the  end 
of  this  thesis.  The  amount  of  duplicated  effort  evident 
in  the  literature  is  testimony  to  the  infancy  of  this 
discipline. 

Several  interesting  conclusions  can  be  made  based  on 
the  results  of  the  Monte  Carlo  analysis  summarized  in 
Tables  I  thru  VI  in  Appendix  D.  As  stated  earlier  these 
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estimators  vere  designed  to  estimate  the  location  parameter 
of  a  symmetric  distribution.  The  relative  efficiencies 
presented  in  Appendix  D  denote  the  performance  of  each 
estimator  relative  to  the  “best”  estimator  for  that  distribution. 
Obviously  if  the  exact  form  of  the  underlying  distribution 
was  known  the  "best"  estimator  could  easily  be  selected. 
Suppose  however,  that  a  sample  cf  size  12  was  drawn 
from  either  a  Normal,  Contaminated  Normal,  or  Double 
Exponential  distribution,  with  equal  probability.  Tables 
I  and  V  show  that  if  the  Sample  Mean  were  selected  to 
estimate  the  location  parameter  the  highest  efficiency  that 
could  be  achieved  would  be  100%  and  the  lowest  efficiency 
would  be  72.7%.  If  however,  the  Hodges -Lehmann  est¬ 
imator  was  chosen  the  highest  efficiency  would  be  100% 
(efficiencies  in  the  Tables  greater  than  100%  are  taken  here 
as  100%)  and  the  lowest  92.  1%.  If  Hogg's  estimator  were 
chosen  the  high  would  have  been  98.  5%  and  the  low  78%. 

Now  consider  all  five  distributions  from  Tables  I  thru  VI. 
Suppose  the  Mid  Range  was  selected  as  the  estimator  of 
the  location  parameter.  Then  the  efficiency  would  range 
from  100%  to  6.2%.  Once  again  if  the  Hodges -Lehmann 
estimator  was  chosen  the  efficiency  would  range  from 
100%  to  32.  5%.  The  Hodges -Lehmann  estimator  is  truly 
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superior  to  the  Mid  Range  when  the  efficiencies  are  comp¬ 
ared  for  each  distribution.  The  data  presented  in  Appendix 
D  shows  that  for  the  estimators  and  distributions  considered 
the  robust  estimators  arc  superior. 

Several  minor  conclusions  are  worthy  of  mention  here. 
First  of  all  a  comparison  of  the  efficiencies  for  sample 
sizes  12  and  24  show  that  as  the  sample  size  gets  larger 
the  "best'*  estimator  gets  better  and  the  efficiency  of  the 
robust  estimator  decreases.  This  is  to  be  expected  how¬ 
ever,  since  the  robust  estimators  were  designed  and  have 
value  only  for  small  sample  sizes.  Another  conclusion 
of  some  import  is  that  varying  the  scale  parameter  of  the 
underlying  distribution  has  no  effect  on  estimation  of  the 
location  parameter.  This  is  evident  from  Tables  I,  II, 

III,  and  IV. 

The  final  conclusion  has  more  application  in  the  area 
of  Monte  Carlo  techniques  than  robust  estimation  techniques. 
During  the  course  of  this  investigation  there  was  some 
question  as  to  the  number  of  times  each  sample  should 
be  drawn.  The  low  figure  was  approxinxalely  5u0  drawings 
and  the  high  figure  5000.  The  graphs  presented  in  Appendix 
C  show  that  for  all  practical  purposes  1000  repetitions 
would  be  sufficient  and  th;it  any  over  2000  is  just  not 

worth  the  computer  time. 
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Areas  For  Further  Investigation 

Initially  the  author  of  this  paper  felt  that  the  Switzer 
estimator  would  have  the  most  efficient  form.  The  choice 
of  competing  forms  of  the  estimator  did  not  bear  out  this 
premise  as  was  demonstrated  in  the  analysis.  If  however, 
a  more  judicious  choice  of  competing  estimators  was  made 
the  performance  of  this  estimator  might  be  significantly 
enhanced.  This  area  plus  the  possibility  of  analyzing 
the  performance  of  these  robust  estimators  when  the 
restriction  of  a  symmetric  underlying  distribution  does 
not  apply  could  be  extremely  fruitful  areas  for  further 
study. 
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pPOGPft*^  'IAIN {INPUT, OU'PUT, PLOT) 

OI'^'-.N?ION  Xt2C} 
niNc-NSION  Y1 .'(.') 

OIMPH':£ON  Z(20) 

CALL  PLOT{l.,2.,-3) 

RrAD  ‘;Q3,N 
00  5  JJ=1|4 
00  3  il=l,6 
00  1  1=1, N 

1  RTAP  501, X(I) 

TE‘’P=0. 

00  2  J=1,M 

Tr»lP=x<J)+TEMP 

Y(J)=TEMP/J 

r.G  T0(1Q,11,12,13,14,1S),11 

4  m=j*350 

Z{J»  =  J 

2  P^INT  502,M,Y(J) 
call  GpAPH(2,Y,M) 

•3  CCNTIHU* 

5  continue 

GO  TO  7 

10  PPINT  601 

GO  TO  4 

11  PRINT  602 

GO  TO  4 

12  PPIt'T  6D3 

•GO  TO  4 

13  PRINT  604 

CO  TO  4 

14  P^INT  615 

CO  TO  t,  . 

15  PPINT  606 

GO  TO  4 

7  CONTINUE 

601  FC.R-AKZBX’HOOGES  LEHMANN  ESTIMATOR*) 

602  TORNAT  (2''X*HOr.GS  ESTI“ATOS») 

GOT  FOPMAKZAX'SXITZERS  ESTl.NSTOR*) 

604  FORMAT (2 SX^SAMPLE  MEAN*) 

635  FCP‘'AT  (33X*SAvplE  MEDIAN*) 

606  FOPMAT (25X*«IO  RANGE*) 

500  FO.RmAT(I3) 

591  FORMAT {F14. 9) 

502  F0R>1AT<//,28X*AFTER  *14*  REPETITIONS*//, 26X ,F14, 10) 
ENO 
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18 


20 

50 


501 

51? 


PRCGR  ft  H  «A  I N  ( r  M  ?UT ,  OUT  PUT  ,  oi;«CH) 

Rrfto  ‘;oo,flyG,PCT,sTo,a,M 
RE40  501, s  ’  * 

CALt  RA»iS^T(S) 

X=S 

PPIHT  600, X 
00  50  JA0=1,12 

ORTMT  601,AVG,oCT,STO,i:,N 
00  18  JJ=12,2<,,12 
VoiJJ)=0. 

VC(JJ)=0. 

VT{jj)=o. 

vr<jj)=n. 

vtl{jj)r:0, 

CCNT1HUH 
00  1  JI=1,M 
00  1  1  =  1?,2<„12 
Y=JI 
K  =  I 

on  2  Jsi,x 

X=PAWf (y) 

RA«OtJ)=X 

CAtL  GAUS5(K, JI,«?oP«) 

CALI  EXPO*i(K,rxP,jJ» 

call  AH00GtP,ftt;0,<, As,0RC1) 

CALL  Af«OOG(COfJT,<,  ACO=D) 

CALL  AhOOG(TPIAS,<,STORO) 

CALL  AUODG iogvQj 
CALL  ftH00L<tXP,<,JEX?0) 

VPOCJ  =  ((APO-r)-,5)»*2)»VR{IC| 

VC f K) 2 ( *  2 ) ^VC (X) 
vt  <K>  =  (fiTOPC)**?)  *'n  «) 

Vf:{K)  =  (C30'iO’-2J  ♦yn«i 

VF(K)S<aCX3p»,?| 

CCNTiUUt 

PO  20  X<=12,2«,,i2 

V«»tK<l=vP(KK>/V 

VC(X»;)syC(«)/N> 

VUKX)=VT(«)/?,' 

voccosv'axx)/^ 

V*’ <<•'')='/£(<<) /.; 

P^lMT  51  ?f *J, X<,  VPtXO  ,  VC(KX)  . VT C<«'1 

CONTI  f;uE 
CftLL  RANGETIT) 

PO*.’CH  501,Y 

SX-RrpJ{fJ{o:.3*;.:J:;-5“J  *J3,/,25 

t  *2*//»2iX*hCDoES  LtPKiNN*//,23x,5F 

^4^.01  * 


5IA.Q) 

601  P0‘'XiT(2X,AF:5.g,j3, 
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PW3RAM  MAI NdNOUT, OUTPUT, *>Uf4CH> 

0IH;»4SI0fl  X?l(53)  ,XC(5G)  ,y T (53 )  , Xt.'(5C >  , X£  (50) 

OlMiNSIO'l  3=1(53)  ,30(53)  ,ST(53)fSN(53)  ,SE  (5D> 

OIMEHSIOM  RAN0(43)  ,=13=?M<<,3)  ,C0NT(-.3)  ,TRI  AM  { (.9)  , ROP.3  (A6) 
DIHEfiJIOM  HR(53)  ,  J0(53)  ,HT(50)  ,f(M(5u)  ,H£(5j) 

OIMrMSIOM  ;XP(l«9)  ,£X»00(1.3) 

OIMEMSION  =10M0{«,3)  ,CO=iO(«.3) , TOP.O (A3) 

OlMcMSlOfl  STATR{8)  ,Sr4TC(S)  ,STATT(3)  ,STATM(S)  , STATE  (8) 
OIMEHSI3M  RR(53)  ,RC(50)  ,:>T(5j)  ,=i«(5:>  ,P.E(53> 

OIHEMSIOM  r=((53) ,YC(53),yr(50),YN(5Ji .YEISO) 

READ  500,AyG,PSr,STO,A,N 
READ  5C1,S 
CALL  RANS£r(S) 

X=S 

PRIHT  600, X 

PRIMT  631,AVG,PCT,SrO,A,M 
00  18  JJ=12,24,12 
S£(JJ)=0. 

SM(JJI=a. 

ST<JJ)=0. 

SC( JJ)£0. 

SR(JJ)=0. 

YR(JJ)s3. 

YC(JJ>sO. 

YT(JJI=0. 

YV<JJ)*0. 

YE<JJ)rO. 

XR(UJisO. 

XC(JJ)=0. 

XT(JJ)=9. 

XN(JJ):3. 

xr(ju)so. 

HRIJJ)=0. 

KC(JU>-0. 

^T<JJ)=0. 

HN(JJ)=3. 

H£(JJ)=0. 

RR(JJ)=0. 

.R0<  Jv')=0. 

RT(JJ)=0. 

RV<JJ)=0. 

Rc(JJ)=3. 

CO'4TINV)£ 

00  1  UI  =  1,'4 
00  1  1=12, 2A, 12 
Y=J1 
X=1 

00  2  J=1,X 
XsRAtJr  (Y) 

RaN0(J)=X 

call  GAUS5(K, JI,RORf() 
call  CNDRMISTOfPCTtK.AVG.RORM.COJiT, JI> 
call  TRIAn3(A,>(,TRIA*(,JI) 
call  EXPOfi{K,£X?,JI) 
call  80S(RAN0,K,STATR) 
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30SfP3io‘'*'^'STdrT) 

I* iS-.'s’ii?-  . 

paV.*- 

call  fl!i4;;3r,3°’3*^»C03D) 

L  A.^4.vc£ ! t'ot^:. ■^* 

call  4^4‘C£  ( .-x? 
call 

call  S‘J£0({Jj3’^'?"=3T> 

''?«<sr«r? 


''♦'Ll  Hojsf 5ta,- *  ' ?0“-«  ’ 

CALI  Al£i4Mf*'v  “>'^|HCVJ 

call  A*ilO(f-,~^''»'«C&J 

®««-L  A.S‘^^.!?»<.^Tr.) 

-  s  (  <33-. 

^^<♦0  ={3c*'*»3^*  *  *^J  <^j 

■‘^‘»«»  =  ((-,.~3f  *-»♦''£«) 

CO.VrjN-yj  ^  ^^♦S£(iC) 
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00  20  K<=l?,2<*tl2 

.  RR{<IC)-R?{<i<)/M 

ftC(KK)=RC(<KI/N 
RT(«)=RT(«)/fl 
RH(!CK)=Rfa<K)/f4 
R£(>CK)=RE(<K)/N 
rR(KiO=YR(«)/M 
Y3(<K)=YC(<'<>/M 
YHKK)=YT<<K)/N 
Y:HK<)=Yrjf«)/N 
YlttKlsYi  <«)/N 
XRf<K)=XR(<<)/M 
X5(KK)=XC(<K)/M 
XT<KK)=XT{<K)/N 
XN(KK)=Xta*>(K)/M 
Xs{<K)=X£(<K)/;j 
HR««)=HR(<*:)/M 
H0(KK)=HC(<K>/N 
^TC<K)=HT(;<K1/M 

HE(<K)=H£t«)/N 
S£(«)=S£(.«>/H 
S'KXOsSM  («)/•! 

ST<KK)=ST(«)/*4 

5C<<K)sSC(<K)/N 

SRtKK)=SR(«)/M 

PRINT  5J2,N|  YRIKKJ  ,YCt<K)  ,YT(«)  ,Y!a«)  ,Y£(K<) 

S,  ,HT(K:<)  |H‘UK<)  ♦H£f<‘C) 

S,XR<«>,x;(K<>,XTtK<)  ,XN{«)  ,X£{«)  ,RR(«)  ,R5{KK»  ,RT  (  «)  ,  RN  (KK>  • 
SR£(XK>  fSR(<K)  ,SC(«),ST{«>,S'a<<)*S£(«) 

20  CCNTINU- 

Cft'-L  »ANG£T(Y> 

PU.NCH  501  fY 

501  FDRf'AT<flA.q) 

500  F0R'?fiT(F5.2,^5. 2,F6.3»F5.2,13) 

502  FGR‘!iT(23X«R£P£TIII0'.S=*I3,/,?5X*;'iRi;>.3£  E ST IM4TE ( H£fiN  SOUfiRE  ERR 

S0R>  FOR  SAXFLE  1E3I 2 3X, 5F1 A . 5, //, 2 SX* FOR  H0G3S  ESTIHATOR*// t 
52eX,5?l«*.5,//f23X*F0R  MEAN*// ,  25x,  5r  lA  .9 ,//,  25X*  FOR  RID  RAN 

$GE*//,23X,5F i<..9,//23X»FCR  S»IT2£RS  cSTIRiTCR*//,  23  X,  5F14  ,9) 

510  F0RKAT<1X,$F15.9) 

600  F0RRAT(1X,F15.9) 

601  C0RRAT(2X,«.F15.9,I3) 

ENO 

SU3R0UTINI  AR10(X,Ky3} 

OIKtNSION  X{46) 

8=(X(l)«X(Kn/2. 

RETURN 

END 

SU3R0UT1NE  SHEOIARRAY.K.AREO) 

OIRENSION  ARRAY{43) 

N=</2 
H=«/2)  ♦! 

4M£0=  {  ARRAY  (N)*  ARRAY  t.DJ/’. 

RETURN 

£HO 
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00  10  1=1, NCP 
1P=I«1 

00  10  J-IP,K 

{ewiJai-'’"-'’’”  ">  »• 

XCDrXIJ) 

XCJ)=T£MP 
0  WN5c(l)rX(I) 
range (K)=X(X) 

RcTU.RN 

END 

SU3?0UTIN£  SHIT(X,X,SMJ 
L=</6^^°‘^  **^**3<3),C(S),X{V8> 
SOA=0. 

S33=0.  . 

SQC=0. 

00  SO  1=1, L 
N=l«6 
H=N-5 
NM=M-1 

00  56  lIsM,NN 

J<*II*l 

00  56  KK=J<,n 

}j‘J”J}jJ^‘X«K))GO  to  56 

XC11)=X(«> 

X«K»=T£N? 

X<II)=X(I1) 

HM=««l 

NM=M-1 

KIsM»2 

MJ=N-2 

*<i>  =  ;:::n)»x{«)»/2.' 

Bdl  *.(X(‘».U+X(M*0  J/?, 

ctii  =  (x(‘jj)+x(.>ir))/2, 

S0A=A(I) tsoa 
SQ3=9(I) tS33 

soc=c(i} tsac 

SQA=S3A/l 

sa3=sa3/L 

sac=so:/L 

0£siOfj=L»(L-l) 

S>5A=0* 

ssa3=o. 

ssac=3. 

00  51  JJ=1,L 

SSQ«=  (A(Jj)-SQA}»»2t(S5('*| 

SSa3=S3a3/D£*;0'< 

SSaS=S33C/D£'i‘3*< 
*^*?SaA.t£.ssC3)G0  TO  52 

TO  53 

»«=SOC 
GO  TO  55 
SKiSOA 
SO  TO  55 
St{=3a3 
COSTl.SU? 

RETURN 
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SUSaOUTlNE  H05G{X,3,C,K,A) 
OIHENS13N  X(8),C(t*S) 
X(8)=X(3)«3. 

IF<X(8)  .GT..<*,)60  TO  35 
IF(X(6).Lr.2.)&0  TO  36 
A=X(l> 

CO  TO  <t0 

35  IP1X(8).LE,5.5>G0  TO  37 
A=B 

GO  TO  <»0 

36  A=0. 

L=tC/4 

M=X-(L-1) 

00  38  T=H,K 

36 

00  39  Js:l,L 

39  A=C(J)tA 
AeA/(2*LI 
CO  TO  40 

37  A=0. 

L=K/4 
H=(K/4)+l 
M'JsK  . 

00  I  -  N=M,flH 
34  As:(N)+A 
A£A/(X/2) 

40  CONTINUE 
RETUSN 
cNO 

SUn^OUTINE  GAUSS «,JI,ROR«) 
niMCf.sioH  R0!?fl(43) 

ZZ=JI 

00  61  1=1, K 

x=o; 

90  60  J=l,12 

60  X=RANF(2Z}«^X 
X=X«6. 

61  RORK(I)=x 
RETURN 
EM9 

SU'^^OUTINF  TRIAS-G(A,X,TRIAN,JI) 
OIH'^N'SIOH  TRIAM(4‘J) 

YY=JI*2 
00  7  1=1,  < 

X=0. 

00  8  J=i,2 

6  X=f>AOFtYY)*X 
TRIA;1{I)=A*  (X-l.) 

7  CONTI HUE 
RFTURM 
END 
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SUa^OUTINS  C*JO»H(STO,PCT,K,AVG#RCR«tCCNT,JI) 
OIMFNSIOM  CONT{4S) ,SORM(48} 

YysJI*3 
00  fc  L=l,< 

X=RANFtYX) 

IF(X.GT.PCT)GO  TO  5 
IFCX.LE.PCDGO  TO  6 

5  CONT(L>  =  <>ORH(L> 

GO  TO  <i 

6  C0NT(L)=AVG*ST0*R0RH(L) 

h  CONTINUE 

RETURN 

END 

SUPPOUTINE  SO?TtA,K) 

OIHFN'JION  A(l) 

logical  switch 

1F(K,E0.1)  RETURN 

Jl=l 

J2=K-: 

1  SHITCH=. FALSE. 

00  2  J=J1,J2 

IFCA(J) ,L£.A(Ofi))GO  TO  2 
T=A<J+1) 

A(J*1)=A(J) 

A(J)=T 

JA=J 

IF(SHITCH)GO  TO  2 
J3=J 

SHITCH=,TRUE. 

2  continue 

'  IF (.NOT, SWITCH)  RETURN 
JI  =  MAXi){l,J3-l) 

J2=*lAX0(l,J«i-l) 

GO  TO  1 
END 

SUa^OUTINE  FXPON«|AtJl> 

OIHFNSION  A(:.8) 
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SUOROUTlMt  AHC05(ft,t(,awS) 

no  1  1=1, If 
n<i>=a(i) 

CALL  SORTta,K> 

Il=if/2  * 

00  2  1=1,11 
C(I1=9<I>+9(K«1+i) 

CAUL  S0RT(C,</2) 

Cl=C(*f/<,J 
C2=C«/<,  +  l) 

1P=0 

Nl=N2=M3=fl4=fJS=0 
00  19  1=1, < 

00  9  J=1,K 
T=9<n49(J) 

1FIT-C1)8,S,3 

lFlt-C2)7,6,4 
M5=N5*<-J*i 
GO  TO  19 
N2=*<2«1 
GO  TO  9 
N4=K«i41 
GO  TO  9 
M?=K341 

0(*13)=T 
GO  TO  9 
NlrfJi*! 

COfJTiMUc 

COlJTTIIUi 

M=K*(K41)/«, 
ir<f.‘3.£O.0>  GO  TO  13 

TO  16 

lP=lpii^^*”* "^0  19 

Cl=C(</f,-.IO| 

C2=C(lf/«,*l?4i) 

GO  JO  12 

1F<(»14N2),ne,(H«,4NS))C0  24, 

AvS=fCl4C2>*.25 

RFTUPf; 

CfU  S09T(0,N3) 

Ir=H-fJi-f;? 

Cl=C(IC>) 

02=0(1041) 

GO  TO  14, 

CALL  SCRT(0,f43) 

10=»*-?41-V2 

IFHK14}42).GT.M)G0  to  21 

IF<70,M£,0)  C1=0(IP) 

3^*10. f;£,N3l  C2=0(i04l| 

GO  TO  14i 
CJ=C2 


60  TO  14, 

C2=C1 
60  TO  14 
END 
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SUS;>CIITI.‘1£  6=?A?H(Z,Y,U) 

01K£N?10M  Z{1«.) 
call  SC4Lt(Z,3.5,‘J,l) 

CtLL  5:CALE{Y,5,0,J,1) 

call  AXIS(0  .  Oi  0.0,15H3£P£TIT10f:’;*35a,-15,3 
CALL  AXIS{0,  J,0.0,27HV4!^lMfJCt<K£A«j  SQUARt 
S)  »y  (t)«2i ) 

CALL  LIN£(2,Y,V,1,1,11) 

CALL  PLOT(10.,0.,-3) 

RFTURN 


•  CtC.3,2(N»l),7(N«2)} 
£RRCR),27,5.0,9C.0,Y(N*1 
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The  thirty  graphs  presented  in  this  section  were  selected 
from  several  hundred  which  were  generated  in  the  course 
of  this  investigation.  One  graph  was  chosen  for  each  of 
the  six  estimators  considered  from  each  of  the  five  dist¬ 
ributions.'  The  values  plotted  along  the  abscissa  are  the 
number  of  times  the  sample  was  drawn  times  350. 

The  values  along  the  ordinate  are  the  cumulative  values 
for  the  mean  square  error.  The  graphs  are  labeled  at 
the  bottom  by  ESTIMATOR/PROBABILITY  DISTRIBUTION/ 


SAMPLE  SIZE. 
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HODGES-LEIIMANN/NORMAL(0,  l)/i2 
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HODGES-LEHMANN/DOUB.  EXPONENTIAL/ 12 
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6>] 


HOGG‘S/RECTANCULAR/12 


HOGG ’S  /  TR I ANG  U  LAR  / 1 2 


HOGG 'S /NOR  MAL(0,  1)/12 


HOGG'S /DOUI3.  EXPONENTIAL/ 12 


REPETITIOMSaSSO 
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SAMPLE  MEAN/TRIANGULAR  / 12 


SAiMPLF.  MEAN/DOJIJ.  EXPONENTJAL,/ 12 


SAMPLE  MEDIAN/RECTANGULAR/ 12 


SAMPLE  MEDIAN/TRIANGULAR/ 12 


SAMPLE  MEDIAN/NORMAL(0,  1)/12 
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SAMPLE  MEDIAN/DOUD.  EXPONENTIAL/ 12 
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MID  R  A  NG  R  /  'r K  f  A  NG  U I  ,A  K  / 1 2 


MID  RANGE/NORMAL(0,  1>/12 
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All  efficiencies  recorded  in  the  following  tables  are 
efficiencies  relative  to  the  best  estimator  consideded  for 
that  particular  distribution.  The  best  estimator  which 
was  used  as  the  base  is  listed  as  100%.  In  the  case  of 
the  contaminated  normal  distribution  there  are  efficiencies 
greater  than  100%  recorded.  This  is  because  some  of  the 
robust  estimators  actually  performed  slightly  better  than 
the  best  estimator  for  that  distribution  which  was  the  sample 
mean. 
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Table  II 

Relative  Efficiencies  For  Sample  Size  24 
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